Unsupervised Learning of Dependency Structure for Language Modeling

نویسندگان

  • Jianfeng Gao
  • Hisami Suzuki
چکیده

This paper presents a dependency language model (DLM) that captures linguistic constraints via a dependency structure, i.e., a set of probabilistic dependencies that express the relations between headwords of each phrase in a sentence by an acyclic, planar, undirected graph. Our contributions are three-fold. First, we incorporate the dependency structure into an n-gram language model to capture long distance word dependency. Second, we present an unsupervised learning method that discovers the dependency structure of a sentence using a bootstrapping procedure. Finally, we evaluate the proposed models on a realistic application (Japanese Kana-Kanji conversion). Experiments show that the best DLM achieves an 11.3% error rate reduction over the word trigram model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Unsupervised Parameter Estimation Algorithm for a Generative Dependency N-gram Language Model

We design a language model based on a generative dependency structure for sentences. The parameter of the model is the probability of a dependency N-gram, which is composed of lexical words with four kinds of extra tags used to model the dependency relation and valence. We further propose an unsupervised expectationmaximization algorithm for parameter estimation, in which all possible dependenc...

متن کامل

Two Approaches for Building an Unsupervised Dependency Parser and Their Other Applications

Much work has been done on building a parser for natural languages, but most of this work has concentrated on supervised parsing. Unsupervised parsing is a less explored area, and unsupervised dependency parser has hardly been tried. In this paper we present two approaches for building an unsupervised dependency parser. One approach is based on learning dependency relations and the other on lea...

متن کامل

Covariance in Unsupervised Learning of Probabilistic Grammars

Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayes...

متن کامل

Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

Dependency parsing is a central task in the field of Natural Language Processing (NLP). The task involves the automatic labeling of natural language sentences with dependency structures, such that each word is labeled as the dependent of another word in the sentence (its syntactic head). This formalism is important, both in the linguistic aspect (Mel’čuk, 1988) and in empirical aspects, as it u...

متن کامل

Phrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features

Recent research has shown clear improvement in translation quality by exploiting linguistic syntax for either the source or target language. However, when using syntax for both languages (“tree-to-tree” translation), there is evidence that syntactic divergence can hamper the extraction of useful rules (Ding and Palmer 2005). Smith and Eisner (2006) introduced quasi-synchronous grammar, a formal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003